
<!DOCTYPE html>
<!--[if IEMobile 7 ]><html class="no-js iem7"><![endif]-->
<!--[if lt IE 9]><html class="no-js lte-ie8"><![endif]-->
<!--[if (gt IE 8)|(gt IEMobile 7)|!(IEMobile)|!(IE)]><!--><html class="no-js" lang="en"><!--<![endif]-->
<head>
  <meta charset="utf-8">
  <title>Into Data</title>
  <meta name="author" content="Haipeng Liu">

  
  <meta name="description" content="Haipeng Liu's tech blog">
  <meta name="keywords" content="C/C++, Python, NLP, Speech Recognition">

  <!-- http://t.co/dKP3o1e -->
  <meta name="HandheldFriendly" content="True">
  <meta name="MobileOptimized" content="320">
  <meta name="viewport" content="width=device-width, initial-scale=1">

  
  <link rel="canonical" href="http://haipengliu.github.io">
  <link href="/favicon.png" rel="icon">
  <link href="/stylesheets/screen.css" media="screen, projection" rel="stylesheet" type="text/css">
  <link href="/atom.xml" rel="alternate" title="Into Data" type="application/atom+xml">
  <script src="/javascripts/modernizr-2.0.js"></script>
  <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
  <script>!window.jQuery && document.write(unescape('%3Cscript src="./javascripts/libs/jquery.min.js"%3E%3C/script%3E'))</script>
  <script src="/javascripts/octopress.js" type="text/javascript"></script>
  <!--Fonts from Google"s Web font directory at http://google.com/webfonts -->
<link href="//fonts.googleapis.com/css?family=PT+Serif:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">
<link href="//fonts.googleapis.com/css?family=PT+Sans:regular,italic,bold,bolditalic" rel="stylesheet" type="text/css">

<!-- mathjax config similar to math.stackexchange -->

<script type="text/x-mathjax-config">
  MathJax.Hub.Config({
    tex2jax: {
      inlineMath: [ ['$','$'], ["\\(","\\)"] ],
      processEscapes: true
    }
  });
</script>

<script type="text/x-mathjax-config">
    MathJax.Hub.Config({
      tex2jax: {
        skipTags: ['script', 'noscript', 'style', 'textarea', 'pre', 'code']
      }
    });
</script>

<script type="text/x-mathjax-config">
    MathJax.Hub.Queue(function() {
        var all = MathJax.Hub.getAllJax(), i;
        for(i=0; i < all.length; i += 1) {
            all[i].SourceElement().parentNode.className += ' has-jax';
        }
    });
</script>

<script type="text/javascript"
   src="http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML">
</script>

  

</head>

<body   >
  <header role="banner"><hgroup>
  <h1><a href="/">Into Data</a></h1>
  
    <h2>data, code, algorithm. A developer&#8217;s blog.</h2>
  
</hgroup>

</header>
  <nav role="navigation"><ul class="subscription" data-subscription="rss">
  <li><a href="/atom.xml" rel="subscribe-rss" title="subscribe via RSS">RSS</a></li>
  
</ul>
  
<form action="https://www.google.com/search" method="get">
  <fieldset role="search">
    <input type="hidden" name="q" value="site:haipengliu.github.io" />
    <input class="search" type="text" name="q" results="0" placeholder="Search"/>
  </fieldset>
</form>
  
<ul class="main-navigation">
  <li><a href="/">Start</a></li>
  <li><a href="/blog/archives">Archives</a></li>
  <li><a href="/about">About</a></li>
</ul>

</nav>
  <div id="main">
    <div id="content">
      <div class="blog-index">
  
  
  
    <article>
      
  <header>
    
      <h1 class="entry-title"><a href="/blog/2014/12/23/how-to-use-octopress-to-write-blog/">How to Use Octopress to Write Blog on Github</a></h1>
    
    
      <p class="meta">
        




<time class='entry-date' datetime='2014-12-23T20:45:48+08:00'><span class='date'><span class='date-month'>Dec</span> <span class='date-day'>23</span><span class='date-suffix'>rd</span>, <span class='date-year'>2014</span></span> <span class='time'>8:45 pm</span></time>
        
      </p>
    
  </header>


  <div class="entry-content"><p>Author: Haipeng Liu <A href="mailto:pliu1981@gmail.com">email to me</A></br>
keywords: Octopress</br></p>

<h1>Install Octopress</h1>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>git clone git://github.com/imathis/octopress.git octopress
</span><span class='line'>cd octopress
</span></code></pre></td></tr></table></div></figure>


<h2>Installing Ruby With Rbenv</h2>

<h3>Install Rbenv</h3>

<figure class='code'><div class="highlight"><table><tr><td class="gutter"><pre class="line-numbers"><span class='line-number'>1</span>
<span class='line-number'>2</span>
<span class='line-number'>3</span>
<span class='line-number'>4</span>
<span class='line-number'>5</span>
<span class='line-number'>6</span>
</pre></td><td class='code'><pre><code class=''><span class='line'>cd
</span><span class='line'>git clone git://github.com/sstephenson/rbenv.git .rbenv
</span><span class='line'>echo 'export PATH="$HOME/.rbenv/bin:$PATH"' &gt;&gt; ~/.bash_profile
</span><span class='line'>echo 'eval "$(rbenv init -)"' &gt;&gt; ~/.bash_profile
</span><span class='line'>git clone git://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
</span><span class='line'>source ~/.bash_profile</span></code></pre></td></tr></table></div></figure>


</div>
  
  
    <footer>
      <a rel="full-article" href="/blog/2014/12/23/how-to-use-octopress-to-write-blog/">Read on &rarr;</a>
    </footer>
  


    </article>
  
  
    <article>
      
  <header>
    
      <h1 class="entry-title"><a href="/blog/2014/11/18/introduction-to-crf-2/">Introduction to CRF(2)</a></h1>
    
    
      <p class="meta">
        




<time class='entry-date' datetime='2014-11-18T17:27:47+08:00'><span class='date'><span class='date-month'>Nov</span> <span class='date-day'>18</span><span class='date-suffix'>th</span>, <span class='date-year'>2014</span></span> <span class='time'>5:27 pm</span></time>
        
      </p>
    
  </header>


  <div class="entry-content"><p>Author: Haipeng Liu <A href="mailto:pliu1981@gmail.com">email to me</A></br>
keywords: CRF</br></p>

<h1>Feature Engineering</h1>

<p>Features are perhaps the most important elements of the statistic model. Good feature engineering can often increase the labelling accuracy very significantly.</p>

<p>A feature function associates the labels and the local data patterns. The node feature take into account the current label $x_t$ and an element of data pattern $g_m[z, t]$. where $\delta[x_t, s]$ is the indicator function. It returns 1 if label of $x$ at $t$ is the label $s$ and 0 otherwise.</p>

<p>The data pattern function $g_m(z, t)$ takes care of the raw data and returns a (real or binary) value. For example, in noun-phrase chunking, we may have a feature $g$ the 100th.  That is, the 100th data feature receives value of 1 at current time $t$ of $10$ if the previous word is <em>today</em>.</p>

<p>&lt;&ndash;! more &ndash;></p>
</div>
  
  


    </article>
  
  
    <article>
      
  <header>
    
      <h1 class="entry-title"><a href="/blog/2014/11/15/my-first-blog/">Introduction to CRF(1)</a></h1>
    
    
      <p class="meta">
        




<time class='entry-date' datetime='2014-11-15T08:55:38+08:00'><span class='date'><span class='date-month'>Nov</span> <span class='date-day'>15</span><span class='date-suffix'>th</span>, <span class='date-year'>2014</span></span> <span class='time'>8:55 am</span></time>
        
      </p>
    
  </header>


  <div class="entry-content"><p>Author: Haipeng Liu <A href="mailto:pliu1981@gmail.com">email to me</A></br>
keywords: CRF</br></p>

<h1>Sequential labelling problem</h1>

<p><strong>Conditional random</strong> field is proposed to model the sequence data. The sequence data are the data or objects with a sequence structure.
For example, in Noun-Phrase (NP) chunking task, for a sentence, we want to provide correct phrasal labels for each word. The phrasal labels are B-NP for the beginning words of a noun-phrase, I-NP for the words inside a noun-phase and O for others. For instant, a labelled sentence in the CoNLL2000 dataset reads like this.</p>

<blockquote><p>Chancellor/O of/O the/B-NP Exchequer/I-NP Nigel/B-NP Lawson/I-NP &rsquo;s/B-NP restated/I-NP commitment/I-NP to/O a/B-NP firm/I-NP monetary/I-NP policy/I-NP has/O helped/O to/O prevent/O a/B-NP freefall/I-NP in/O sterling/B-NP over/O the/B-NP past/I-NP week/I-NP ./O</p></blockquote>

<p><em>CoNLL2000</em></p>

<p>The CRF can automatically label the noun phrase character as the human does</p>

<h1>Model the sequence</h1>

<p>To model the labeling process. We use $x$ to denote the labels and $z$ to denote the data. In our case $x$ is the sequence of noun phrase labels and $z$ is the sequence of words. $t$ is the sequence index. Also, $x$ is named hidden state variables, $z$ is named the observations. The task of CRF is to predict the state variables given the observations. That is to compute the probability:
   </div>
  
  
    <footer>
      <a rel="full-article" href="/blog/2014/11/15/my-first-blog/">Read on &rarr;</a>
    </footer>
  


    </article>
  
  <div class="pagination">
    
    <a href="/blog/archives">Blog Archives</a>
    
  </div>
</div>
<aside class="sidebar">
  
    <section>
  <h1>About Me</h1>
  <p>Haipeng Liu</p>
  <p>Currently working in Beijing China</p>
  <p>Major research area is Natural Language Processing and Machine Learning.</p>
  <p>Love coding and like to share my thoughts and ideas with you.</p>
  
  <p>Contact me at pliu1981 # gmail.com</p>
</section>
<section>
  <h1>Recent Posts</h1>
  <ul id="recent_posts">
    
      <li class="post">
        <a href="/blog/2014/12/23/how-to-use-octopress-to-write-blog/">How to Use Octopress to Write Blog on Github</a>
      </li>
    
      <li class="post">
        <a href="/blog/2014/11/18/introduction-to-crf-2/">Introduction to CRF(2)</a>
      </li>
    
      <li class="post">
        <a href="/blog/2014/11/15/my-first-blog/">Introduction to CRF(1)</a>
      </li>
    
  </ul>
</section>




<section> 
  <h1>Categories</h1> 
  <ul id="categories"> 
    <li class='category'><a href='/blog/categories/blog/'>blog (1)</a></li>
<li class='category'><a href='/blog/categories/crf/'>crf (2)</a></li>
<li class='category'><a href='/blog/categories/machine-learning/'>machine learning (2)</a></li>
<li class='category'><a href='/blog/categories/octopress/'>octopress (1)</a></li>
 
  </ul> 
</section>

  
</aside>

    </div>
  </div>
  <footer role="contentinfo"><p>
  Copyright &copy; 2014 - Haipeng Liu -
  <span class="credit">Powered by <a href="http://octopress.org">Octopress</a></span>
</p>

</footer>
  

<script type="text/javascript">
      var disqus_shortname = 'haipengliu';
      
        
        var disqus_script = 'count.js';
      
    (function () {
      var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
      dsq.src = '//' + disqus_shortname + '.disqus.com/' + disqus_script;
      (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
    }());
</script>







  <script type="text/javascript">
    (function(){
      var twitterWidgets = document.createElement('script');
      twitterWidgets.type = 'text/javascript';
      twitterWidgets.async = true;
      twitterWidgets.src = '//platform.twitter.com/widgets.js';
      document.getElementsByTagName('head')[0].appendChild(twitterWidgets);
    })();
  </script>





</body>
</html>
